Skip to content

Conversation

@seer-by-sentry
Copy link
Contributor

Fixes DEVSERVICES-7Y. The issue was that: _is_program_running fails to catch socket connection errors when supervisor is unavailable, crashing concurrent threads.

  • Update _is_program_running to catch SupervisorConnectionError, socket.error, and ConnectionRefusedError, assuming the process is not running if the supervisor is unreachable.
  • Update shutdown to silently pass if connection errors occur, indicating the supervisor is already down.
  • Update stop_process to silently pass if connection errors occur, indicating the supervisor is unavailable and the process is stopped.

This fix was generated by Seer in Sentry, triggered by [email protected]. 👁️ Run ID: 7674042

Not quite right? Click here to continue debugging with Seer.

@hubertdeng123 hubertdeng123 marked this pull request as ready for review December 17, 2025 22:37
Comment on lines 300 to 308
self._get_rpc_client().supervisor.shutdown()
except xmlrpc.client.Fault as e:
raise SupervisorError(f"Failed to stop supervisor: {e.faultString}")
except (SupervisorConnectionError, socket.error, ConnectionRefusedError):
# Supervisor is already down, nothing to do
pass

def start_process(self, name: str) -> None:
if self._is_program_running(name):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The start_process method lacks error handling for connection failures, creating a race condition where a thread can crash if the supervisor becomes unavailable after the initial status check.
Severity: HIGH | Confidence: High

🔍 Detailed Analysis

A race condition exists in the start_process method. The function first checks if a process is running and then attempts to start it. However, the supervisor daemon could become unavailable between these two operations. Because the underlying XML-RPC client connects lazily, a socket.error or ConnectionRefusedError can be raised during the start attempt. The current implementation only handles xmlrpc.client.Fault, leaving these connection errors unhandled. This is especially problematic as start_process is called from concurrent threads in up.py, which could lead to thread crashes.

💡 Suggested Fix

Update the try...except block in the start_process method to also catch SupervisorConnectionError, socket.error, and ConnectionRefusedError, similar to the error handling already implemented in the stop_process method. This will prevent thread crashes if the supervisor daemon is unreachable.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: devservices/utils/supervisor.py#L300-L308

Potential issue: A race condition exists in the `start_process` method. The function
first checks if a process is running and then attempts to start it. However, the
supervisor daemon could become unavailable between these two operations. Because the
underlying XML-RPC client connects lazily, a `socket.error` or `ConnectionRefusedError`
can be raised during the start attempt. The current implementation only handles
`xmlrpc.client.Fault`, leaving these connection errors unhandled. This is especially
problematic as `start_process` is called from concurrent threads in `up.py`, which could
lead to thread crashes.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 7674921

@hubertdeng123 hubertdeng123 merged commit 3f48b8e into main Dec 17, 2025
14 checks passed
@hubertdeng123 hubertdeng123 deleted the seer/fix/supervisor-connection-handling branch December 17, 2025 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants